- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.1k
 
Avoid capturing lambda allocation when retrieving existing meters #6670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In particular focuses on Timer, LongTaskTimer, and Counter as these are the types expected to be retrieved (non-functional) the most. Avoiding this allocation makes the retrieval more performant and generates less garbage, causing less GC pressure.
| 
           What we had before these changes (from the "after" in #6661): After these changes, retrieving e.g. a Timer will only allocate the  And with less allocation, the benchmark results show retrieving an existing Timer is even faster than in #6661. I also realized that having any stale meters affects the performance of retrieving an existing meter because we have to check the stale meter set, whereas we have an early return if the set is empty. This is yet another reason people should heed the warning to avoid setups that result in stale meters. I modified the benchmark and ran it without any stale meters to show the difference. I've summarized the results in the table below. MeterRegistrationBenchmark#registerExistingTimer benchmark results
  | 
    
| * @param id The identifier for this counter. | ||
| * @return A new or existing counter. | ||
| * @implNote Avoids allocation in the case the meter is already registered, | ||
| * particularly capturing lambdas. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these implementation notes to the JavaDoc in a few places. It's quite easy to not notice that some code written is a capturing lambda that will likely cause allocation. I suspect we were not consciously aware of that for this code before (I don't recall being aware of that in this part of the code).
We should come up with a way to make sure this doesn't regress. We have some tests that assert there were no allocations in some executed code. Perhaps we could do similar with these. Asserting no allocations is generally easier than asserting a specific number of bytes allocated, since that depends on the JVM and environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests for this in a subsequent commit. From the non-public methods asserting no allocations worked fine, but to assert only the ID allocation from a public method required adding public API variants of existing API that takes a Tags instance to avoid calling Tags.of which may allocate before JIT optimizations.
| PauseDetector pauseDetectorOverride) { | ||
| return registerMeterIfNecessary(Timer.class, id, distributionStatisticConfig, (id2, filteredConfig) -> { | ||
| Meter.Id withUnit = id2.withBaseUnit(getBaseTimeUnitStr()); | ||
| return newTimer(withUnit, filteredConfig.merge(defaultHistogramConfig()), pauseDetectorOverride); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging with the default histogram config was moved into getOrCreateMeter when the passed DistributionStatisticConfig is not null. I think I initially did this in an effort to get rid of references in the lambda body that would be captured. Since things ended up with the registry as a parameter, I'm not sure it matters where the code is; it should be the same either way (famous last words). I suppose it's a little less duplication to have it in getOrCreateMeter, but it makes the method body longer in a critical function, which may make it harder to read the code.
Separately, it occurs to me that the method defaultHistogramConfig() will most likely always return an equivalent instance but it builds and merges it every time in most implementations. This is wasteful and could likely be improved. It only affects new meters, but it's still wasteful if it's always returning an equivalent config. I did at one point try one naive approach which didn't work, so it can be left as future work to be considered.
| return registerMeterIfNecessary(meterClass, id, null, (id2, conf) -> builder.apply(id2), noopBuilder); | ||
| BiFunction<MeterRegistry, Meter.Id, M> meterSupplier, Function<Meter.Id, M> noopBuilder) { | ||
| return registerMeterIfNecessary(meterClass, id, null, null, | ||
| (registry, mappedId, conf, pd) -> meterSupplier.apply(registry, mappedId), noopBuilder); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adapting of the BiFunction to NewMeterSupplier is a capturing lambda that allocates (it did before these changes too), which is why we avoid calling this variant of registerMeterIfNecessary from the Counter call stack. It's convenient though, so we still use it for Function meters that should only ever be registered once anyway.
| 
               | 
          ||
| private Meter getOrCreateMeter(@Nullable DistributionStatisticConfig config, | ||
| BiFunction<Meter.Id, /* Nullable Generic */ DistributionStatisticConfig, ? extends Meter> builder, | ||
| @Nullable PauseDetector specificPauseDetector, NewMeterSupplier<? extends Meter> meterSupplier, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We take the pause detector as a parameter here so we don't need to capture it in the lambda for the NewMeterSupplier (previously the BiFunction).
7addd49    to
    3b1ccc0      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added allocation tests, which needed some new public API to avoid calling Tags.of with something that's already a Tags instance because it can allocate before JIT optimizes. We looked at reworking the method to avoid that case, but the results were surprising. See #6148. Perhaps we can revisit that at some point.
| * @return A new or existing long task timer. | ||
| * @since 1.16.0 | ||
| */ | ||
| public LongTaskTimer longTaskTimer(String name, Tags tags) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added to avoid the Tags.of call when we already have a Tags instance. After JIT warm-up, it should not make any difference, but until then, calls to Tags.of with a Tags instance can allocate an ArrayIterator. This was an issue in the added unit test where JIT will not have eliminated that allocation.
| */ | ||
| public Counter counter(String name, Iterable<Tag> tags) { | ||
| return Counter.builder(name).tags(tags).register(this); | ||
| return counter(name, Tags.of(tags)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I left this with the builder to prove the point that JIT can eliminate the apparent allocation that was here. While that's still true, this change along with the new counter method that takes a Tags parameter avoids some allocations even before JIT optimizes. This was useful for the added unit test asserting allocated bytes, where JIT won't have optimized the code. It also helps reduce allocation in applications before JIT optimization (e.g. start-up or low traffic applications).
| * @return A new or existing timer. | ||
| * @since 1.16.0 | ||
| */ | ||
| public Timer timer(String name, Tags tags) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this added API (and the equivalent counter/ltt API) non-public if we don't want to add API that's mostly sugar which could overwhelm users with options, but it seems harmless enough, and as mentioned in another comment, there are cases it can improve performance before JIT optimization.
3b1ccc0    to
    b1336c5      
    Compare
  
    | 
           I'm sure there are ways in which this could be done better, but I'll merge for now. We have some time until the next release, and all of this is implementation details other than the new API that is overloads of existing API. Feedback post merge is welcome.  | 
    
In particular focuses on Timer, LongTaskTimer, and Counter as these are the types expected to be retrieved the most on hot code paths (as opposed to function meters that are only registered once). Avoiding this allocation makes the retrieval faster and generates less garbage, causing less GC pressure.
The original intent was to pass a code block (lambda body) that creates the new meter if it doesn't already exist. If the meter does already exist, it is returned without running the code block. However, because the lambda needed to capture variables outside its scope, it resulted in allocation to pass the lambda, whether it was run or not.
Similar to #6661, this helps with Observation-based instrumentation that's using the
DefaultMeterObservationHandler, which will retrieve a LongTaskTimer and Timer on start and stop respectively.Adds new public overloads of timer, longTaskTimer, and counter methods that take
Tagsbecause the existing methods takeIterable<Tag>which meansTags.ofneeds to be called on it, which can result in allocation (to check the size of the Iterable) before JIT optimization.